For Embedded Applications with Data-level Parallelism, a Vector Processor Offers High Performance at Low Power Consumption and Low Design Complexity. unlike Superscalar and Vliw Designs, a Vector Processor Is Scalable and Can Optimally Match Specific

نویسندگان

  • Christoforos E. Kozyrakis
  • David A. Patterson
چکیده

Designers of embedded processors have typically optimized for low power consumption and low design complexity to minimize cost. Performance was a secondary consideration. Nowadays, many embedded systems (set-top boxes, game consoles, personal digital assistants, and cell phones) commonly perform computation-intensive media tasks such as video processing, speech transcoding, graphics, and high-bandwidth telecommunications. Consequently, modern embedded processors must provide high performance in addition to low cost. They must also be easy to scale and customize to meet the rigorous time-to-market requirements for consumer electronic products. The conventional wisdom for high-performance embedded processors is to use the superscalar or very large instruction word (VLIW) paradigms developed for desktop computing. Both approaches exploit instruction-level parallelism (ILP) in applications in order to execute in parallel a few operations per cycle. Superscalar processors detect ILP dynamically with hardware, which leads to increased power consumption and complexity. VLIW processors rely on the compiler to detect ILP, which leads to increased code size. Both approaches are difficult to scale because they require either significant hardware redesign (superscalar) or instruction-set redefinition (VLIW). Furthermore, scaling up either of the two exacerbates their initial disadvantages. This article advocates an alternative approach to embedded processors that provides high performance for critical tasks without sacrificing power efficiency or design simplicity. The key observation is that multimedia and telecommunications tasks contain large amounts of data-level parallelism (DLP). Hence, it’s not surprising that we revisit vector architectures, the paradigm developed for high performance with the large-scale DLP available in scientific computations. Just as superscalar and VLIW processors for desktop systems adjusted to accommodate embedded designs, we can revise vector architectures for supercomputers to serve in embedded applications. To demonstrate that vector architectures meet the requirements of embedded media processing, we evaluate the Vector IRAM, or VIRAM (pronounced “V-IRAM”), architecture developed at UC Berkeley, using benchmarks from the Embedded Microprocessor Christoforos E. Kozyrakis

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scalable Vector Processors for Embedded Systems

Designers of embedded processors have typically optimized for low power consumption and low design complexity to minimize cost. Performance was a secondary consideration. Nowadays, many embedded systems (set-top boxes, game consoles, personal digital assistants, and cell phones) commonly perform computation-intensive media tasks such as video processing, speech transcoding, graphics, and high-b...

متن کامل

C-slow Technique vs Multiprocessor in designing Low Area Customized Instruction set Processor for Embedded Applications

The demand for high performance embedded processors, for consumer electronics, is rapidly increasing for the past few years. Many of these embedded processors depend upon custom built Instruction Ser Architecture (ISA) such as game processor (GPU), multimedia processors, DSP processors etc. Primary requirement for consumer electronic industry is low cost with high performance and low power cons...

متن کامل

Ultra-Low-Energy DSP Processor Design for Many-Core Parallel Applications

Background and Objectives: Digital signal processors are widely used in energy constrained applications in which battery lifetime is a critical concern. Accordingly, designing ultra-low-energy processors is a major concern. In this work and in the first step, we propose a sub-threshold DSP processor. Methods: As our baseline architecture, we use a modified version of an existing ultra-low-power...

متن کامل

Evaluating Compiler Support for Complexity Effective Network Processing

Statically scheduled processors are known to enable low complexity hardware implementations that lead to reduced design and verification time. However, statically scheduled processors are critically dependent on the compiler to exploit instruction level parallelism and deliver higher performance. In order to ascertain the suitability of statically scheduled processors for network processing (wh...

متن کامل

Hardware/Compiler Co-development for an Embedded Media Processor

Embedded and portable systems running multimedia applications create a new challenge for hardware architects. The microprocessor needed for such systems is a merged general-purpose processor and digital-signal processor, with the programmability the former and the performance and power budget of the latter. This paper presents the co-development of the instruction set, the hardware, and the com...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004